Notes & questions

From here:

  • postgibbs thinning: “If you want to keep all the samples you have drawn, you have to put the same number as you put before. Or, you can input a multiple of the original number. For example, in this case, you can input 10, 20, 30 and so on, because the original interval was 10. If you type an inappropriate number, the program will stop with a suspicious message.”
  • The independent chain size corresponds to the number of independent samples that can be seen as independent. If the independent chain size is 3, the statistics (i.e. posterior mean and SD) are equivalently calculated using only 3 independent samples, and obviously, this is too small. This indicates whether you need more samples or not.
  • Effective sample size: “At least > 10 is recommended. > 30 may be better.”
  • You can evaluate a sufficient interval for samples to be saved using Independent chain size and Autocorrelations. Two adjacent samples are usually highly correlated because the next sample is drawn based on the current one. When the correlation is still high between distant samples, the dependency-level is also high and the absolute values of the samples should be very similar.
  • When the distribution of the samples is not normal, there is a difference between “Mode” and “Mean”

Sampling parameters:

  • 1,000,000 total samples
  • Burn-in of 50,000 samples removed in postgibbs
  • Kept every 20th sample during sampling, thinned to every 100th sample in postgibbs

Setup

gibbs_samples <-
  purrr::map2_dfr(.x = rep(c(1:10), 
                           times = 7),
                  .y = rep(c("1", "2", "3alt", "5", "7", "8", "9"),
                           times = 10),
                  ~ read_gibbs_samples(iteration = .x,
                                 region = .y)) %>% 
  select(iter, dataset, everything(), -X1, -X3) %>% 
  purrr::set_names("iter", "dataset", "round", c(1:14)) %>% 
  tidyr::pivot_longer(cols = c(`1`:`14`),
                      names_to = "param") %>% 
  mutate(param = case_when(param == "1" ~ "dir1dir1",
                           param == "2" ~ "dir1dir2",
                           param == "3" ~ "dir1mat1",
                           param == "4" ~ "dir1mat2",
                           param == "5" ~ "dir2dir2",
                           param == "6" ~ "dir2mat1",
                           param == "7" ~ "dir2mat2",
                           param == "8" ~ "mat1mat1",
                           param == "9" ~ "mat1mat2",
                           param == "10" ~ "mat2mat2",
                           param == "11" ~ "mpe1mpe1",
                           param == "12" ~ "mpe2mpe2",
                           param == "13" ~ "res1res1",
                           param == "14" ~ "res2res2"),
         iter = as.character(iter))
gibbs_mce <-
  purrr::map2_dfr(.x = rep(c(1:10), 
                           times = 7),
                  .y = rep(c("1", "2", "3alt", "5", "7", "8", "9"),
                           times = 10),
                  ~ read_gibbs_mce(iteration = .x,
                                   region = .y)) %>%
  left_join(gibbs_samples %>% 
              mutate(iter = as.integer(iter)) %>% 
              group_by(iter, dataset, param) %>% 
              tally(name = "n_samples")) %>% 
  select(iter, dataset, param, n_samples, everything()) 
gibbs_psd <-
  purrr::map2_dfr(.x = rep(c(1:10), 
                           times = 7),
                  .y = rep(c("1", "2", "3alt", "5", "7", "8", "9"),
                           times = 10),
                  ~ read_gibbs_psd(iteration = .x,
                                 region = .y)) %>%
  left_join(gibbs_samples %>% 
              mutate(iter = as.integer(iter)) %>% 
              group_by(iter, dataset, param) %>% 
              tally(name = "n_samples")) %>% 
  select(iter, dataset, param, n_samples, everything()) 

Monte Carlo error by time series

  • Note some parameters with fewer than 10 effective samples - is this cause for concern?

Posterior standard deviation

  • Note also some parameters with fewer than 10 independent batches
    • I don’t understand the difference between “effective samples” and “independent batches”

Plot post-burnin and post-thinning samples

plot_gibbs_iter <-
  function(df) {
    df %>% 
      filter(!stringr::str_detect(param, "res|mpe")) %>% 
      ggplot(aes(x = round,
                 y = value,
                 color = param)) +
      geom_line() +
      ggsci::scale_color_ucscgb() +
      theme_classic() +
      facet_wrap(~ dataset, nrow = 3)
  }
plot_gibbs_region <- 
  function(df) {
    df %>% 
      filter(!stringr::str_detect(param, "res|mpe")) %>% 
      mutate(iter = as.numeric(iter)) %>% 
      arrange(iter) %>% 
      mutate(iter = glue("Iteration {iter}"),
             iter = forcats::fct_inorder(as.factor(iter))) %>%
      ggplot(aes(x = round, 
                 y = value,
                 color = param)) +
      geom_line() +
      theme_classic() +
      ggsci::scale_color_ucscgb() +
      labs(x = "Round", 
           y = "Value",
           color = "Parameter") +
      facet_wrap(~ iter, ncol = 2)
  }

Desert

gibbs_samples %>% 
  filter(dataset == "3v1") %>% 
  plot_gibbs_region()

Southeast

gibbs_samples %>% 
  filter(dataset == "3v2") %>% 
  plot_gibbs_region()

High Plains, control

gibbs_samples %>% 
  filter(dataset == "3v3alt") %>% 
  plot_gibbs_region()

Arid Prairie

gibbs_samples %>% 
  filter(dataset == "3v5") %>% 
  plot_gibbs_region()

Forested Mountains

gibbs_samples %>% 
  filter(dataset == "3v7") %>% 
  plot_gibbs_region()

Fescue Belt

gibbs_samples %>% 
  filter(dataset == "3v8") %>% 
  plot_gibbs_region()

Upper Midwest & Northeast

gibbs_samples %>% 
  filter(dataset == "3v9") %>% 
  plot_gibbs_region()